Application of reinforcement learning to balancing of acrobot
نویسندگان
چکیده
The acrobot is a two-link robot, actuated only at the joint between the two links. It is one of dicult tasks in reinforcement learning (RL) to control the acrobot because it has nonlinear dynamics and continuous state and action spaces. In this article, we discuss applying the RL to the task of balancing control of the acrobot. Our RL method has an architecture similar to the actor-critic. The actor and the critic are approximated by normalized Gaus-sian networks, which are trained by an on-line EM algorithm. We also introduce eligibility traces for our actor-critic architecture. Our computer simulation shows that our method is able to achieve fairly good control with a small number of trials. 1. INTRODUCTION Humans are able to acquire control of their bodies through trial-and-error without detailed knowledge about body dynamics. Reinforcement learning (RL) is a kind of machine learning scheme that is like this. RL methods have been successfully applied to various Markov decision problems that have nite state/action spaces, such as the backgammon game [1]. On the other hand, applications to control problems of human or robot motion are much more dicult because the state and action spaces in these systems are continuous. In such cases, a good function approximator and a fast learning algorithm are crucial to achieving good performance. In our previous paper [2], we proposed an RL method based on our previously developed on-line EM algorithm [3] and applied it to a couple of control problems in which the state and action spaces were continuous. In this article, we discuss applying this method to the task of how to control the balance of an acrobot [4, 5] that is a two-link underac-tuated robot roughly analogous to a gymnast swinging on a high bar. Our model has an architecture similar to the actor-critic model proposed by Barto et al [6]. The actor yields a control signal for the current state, and the critic predicts the accumulation of rewards given in the future. However , in our model, the detailed implementations of the actor and the critic are quite dierent from those of the original model. The actor learns a control that maximizes the output of the current critic. The critic learns the Q-function that follows the current actor, and is based on the Bellman's equation [7]. The actor and the critic are then approximated by the normalized Gaussian networks (NGnet's) [8], which are …
منابع مشابه
Application of reinforcement learning based on on-line EM algorithm to balancing of acrobot
متن کامل
Efficient reinforcement learning: model-based Acrobot control
|Several methods have been proposed in the reinforcement learning literature for learning optimal policies for sequential decision tasks. Q-learning is a model-free algorithm that has recently been applied to the Acrobot, a two-link arm with a single actuator at the elbow that learns to swing its free endpoint above a target height. However, applying Q-learning to a real Acrobot may be impracti...
متن کاملBrain Inspired Reinforcement Learning
Successful application of reinforcement learning algorithms often involves considerable hand-crafting of the necessary non-linear features to reduce the complexity of the value functions and hence to promote convergence of the algorithm. In contrast, the human brain readily and autonomously finds the complex features when provided with sufficient training. Recent work in machine learning and ne...
متن کاملHigh-accuracy value-function approximation with neural networks applied to the acrobot
Several reinforcement-learning techniques have already been applied to the Acrobot control problem, using linear function approximators to estimate the value function. In this paper, we present experimental results obtained by using a feedforward neural network instead. The learning algorithm used was model-based continuous TD(λ). It generated an efficient controller, producing a high-accuracy ...
متن کاملToward Nonlinear Local Reinforcement Learning Rules Through Neuroevolution
We consider the problem of designing local reinforcement learning rules for artificial neural network (ANN) controllers. Motivated by the universal approximation properties of ANNs, we adopt an ANN representation for the learning rules, which are optimized using evolutionary algorithms. We evaluate the ANN rules in partially observable versions of four tasks: the mountain car, the acrobot, the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999